AITopics | reconstruction and generation

Collaborating Authors

reconstruction and generation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adversarial Symmetric Variational Autoencoder

Yuchen Pu, Weiyao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li, Lawrence Carin

Neural Information Processing SystemsNov-21-2025, 08:32:53 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation

Zheng, Anlin, Wen, Xin, Zhang, Xuanyang, Ma, Chuofan, Wang, Tiancai, Yu, Gang, Zhang, Xiangyu, Qi, Xiaojuan

arXiv.org Artificial IntelligenceOct-28-2025

In this work, we present a novel direction to build an image tokenizer directly on top of a frozen vision foundation model, which is a largely underexplored area. Specifically, we employ a frozen vision foundation model as the encoder of our tokenizer. To enhance its effectiveness, we introduce two key components: (1) a region-adaptive quantization framework that reduces redundancy in the pre-trained features on regular 2D grids, and (2) a semantic reconstruction objective that aligns the tokenizer's outputs with the foundation model's representations to preserve semantic fidelity. Based on these designs, our proposed image tokenizer, VFMTok, achieves substantial improvements in image reconstruction and generation quality, while also enhancing token efficiency. It further boosts autoregressive (AR) generation -- achieving a gFID of 1.36 on ImageNet benchmarks, while accelerating model convergence by three times, and enabling high-fidelity class-conditional synthesis without the need for classifier-free guidance (CFG). The code is available at https://github.com/CVMI-Lab/VFMTok.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2507.08441

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback

An Image is Worth 32 Tokens for Reconstruction and Generation

Neural Information Processing SystemsMay-27-2025, 20:18:54 GMT

Recent advancements in generative models have highlighted the crucial role of image tokenization in the efficient synthesis of high-resolution images. Tokenization, which transforms images into latent representations, reduces computational demands compared to directly processing pixels and enhances the effectiveness and efficiency of the generation process. Prior methods, such as VQGAN, typically utilize 2D latent grids with fixed downsampling factors. However, these 2D tokenizations face challenges in managing the inherent redundancies present in images, where adjacent regions frequently display similarities. To overcome this issue, we introduce Transformer-based 1-Dimensional Tokenizer (TiTok), an innovative approach that tokenizes images into 1D latent sequences.

reconstruction and generation, representation, worth 32, (8 more...)

Neural Information Processing Systems

Genre:

Research Report > Promising Solution (0.40)
Overview > Innovation (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.57)

Add feedback

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Wu, Sibo, Xu, Congrong, Huang, Binbin, Geiger, Andreas, Chen, Anpei

arXiv.org Artificial IntelligenceMar-29-2025

Recently, 3D reconstruction and generation have demonstrated impressive novel view synthesis results, achieving high fidelity and efficiency. However, a notable conditioning gap can be observed between these two fields, e.g., scalable 3D scene reconstruction often requires densely captured views, whereas 3D generation typically relies on a single or no input view, which significantly limits their applications. W e found that the source of this phenomenon lies in the misalignment between 3D constraints and generative priors. T o address this problem, we propose a reconstruction-driven video diffusion model that learns to condition video frames on artifact-prone RGB-D renderings. Moreover, we propose a cyclical fusion pipeline that iteratively adds restoration frames from the generative model to the training set, enabling progressive expansion and addressing the viewpoint saturation limitations seen in previous reconstruction and generation pipelines. Our evaluation, including view synthesis from sparse view and masked input, validates the effectiveness of our approach.

artificial intelligence, diffusion model, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2503.21219

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Zhejiang Province (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Adversarial Symmetric Variational Autoencoder

Yuchen Pu, Weiyao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li, Lawrence Carin

Neural Information Processing SystemsOct-3-2024, 09:17:27 GMT

A new form of variational autoencoder (VAE) is developed, in which the joint distribution of data and codes is considered in two (symmetric) forms: (i) from observed data fed through the encoder to yield codes, and (ii) from latent codes drawn from a simple prior and propagated through the decoder to manifest data. Lower bounds are learned for marginal log-likelihood fits observed data and latent codes. When learning with the variational bound, one seeks to minimize the symmetric Kullback-Leibler divergence of joint density functions from (i) and (ii), while simultaneously seeking to maximize the two marginal log-likelihoods. To facilitate learning, a new form of adversarial training is developed. An extensive set of experiments is performed, in which we demonstrate state-of-the-art data reconstruction and generation on several image benchmark datasets.

adversarial network, joint distribution, learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Explicitly Minimizing the Blur Error of Variational Autoencoders

Bredell, Gustav, Flouris, Kyriakos, Chaitanya, Krishna, Erdil, Ertunc, Konukoglu, Ender

arXiv.org Artificial IntelligenceApr-12-2023

Variational autoencoders (VAEs) are powerful generative modelling methods, however they suffer from blurry generated samples and reconstructions compared to the images they have been trained on. Significant research effort has been spent to increase the generative capabilities by creating more flexible models but often flexibility comes at the cost of higher complexity and computational cost. Several works have focused on altering the reconstruction term of the evidence lower bound (ELBO), however, often at the expense of losing the mathematical link to maximizing the likelihood of the samples under the modeled distribution. Here we propose a new formulation of the reconstruction term for the VAE that specifically penalizes the generation of blurry images while at the same time still maximizing the ELBO under the modeled distribution. We show the potential of the proposed loss on three different data sets, where it outperforms several recently proposed reconstruction losses for VAEs.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.05939

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Classify and Generate: Using Classification Latent Space Representations for Image Generations

Gopalakrishnan, Saisubramaniam, Singh, Pranshu Ranjan, Yazici, Yasin, Foo, Chuan-Sheng, Chandrasekhar, Vijay, Ambikapathi, ArulMurugan

arXiv.org Machine LearningDec-14-2021

Utilization of classification latent space information for downstream reconstruction and generation is an intriguing and a relatively unexplored area. In general, discriminative representations are rich in class-specific features but are too sparse for reconstruction, whereas, in autoencoders the representations are dense but have limited indistinguishable class-specific features, making them less suitable for classification. In this work, we propose a discriminative modeling framework that employs manipulated supervised latent representations to reconstruct and generate new samples belonging to a given class. Unlike generative modeling approaches such as GANs and VAEs that aim to model the data manifold distribution, Representation based Generations (ReGene) directly represent the given data manifold in the classification space. Such supervised representations, under certain constraints, allow for reconstructions and controlled generations using an appropriate decoder without enforcing any prior distribution. Theoretically, given a class, we show that these representations when smartly manipulated using convex combinations retain the same class label. Furthermore, they also lead to the novel generation of visually realistic images. Extensive experiments on datasets of varying resolutions demonstrate that ReGene has higher classification accuracy than existing conditional generative models while being competitive in terms of FID.

decoder, latent space representation, representation, (14 more...)

arXiv.org Machine Learning

doi: 10.1016/j.neucom.2021.10.090

2004.07543

Country: